Data Augmentation for Arabic Speech Recognition Based on End-to-End Deep Learning
نویسندگان
چکیده
End-to-end deep learning approach has greatly enhanced the performance of speech recognition systems. With techniques, overfitting stills main problem with a little data. Data augmentation is suitable solution for problem, which adopted to improve quantity training data and enhance robustness models. In this paper, we investigate method enhancing Arabic automatic (ASR) based on end-to-end learning. applied original corpus increasing by applying noise adaptation, pitch-shifting, speed transformation. An CNN-LSTM attention-based encoder-decoder are included in building acoustic model decoding phase. This considered as state-of-art learning, best our knowledge, there no prior research employed ASR addition, language built using RNN-LM LSTM-LM methods. The Standard Single Speaker Corpus (SASSC) without diacritics used an corpus. Experimental results show that improved word error rate (WER) when compared same augmentation. achieved average reduction WER 4.55%.
منابع مشابه
Deep Speech: Scaling up end-to-end speech recognition
We present a state-of-the-art speech recognition system developed using end-toend deep learning. Our architecture is significantly simpler than traditional speech systems, which rely on laboriously engineered processing pipelines; these traditional systems also tend to perform poorly when used in noisy environments. In contrast, our system does not need hand-designed components to model backgro...
متن کاملEnd-to-End Deep Neural Network for Automatic Speech Recognition
We investigate the efficacy of deep neural networks on speech recognition. Specifically, we implement an end-to-end deep learning system that utilizes mel-filter bank features to directly output to spoken phonemes without the need of a traditional Hidden Markov Model for decoding. The system will comprise of two variants of neural networks for phoneme recognition. In particular, we utilize conv...
متن کاملEnd-to-End Deep Learning for Driver Distraction Recognition
In this paper, an end-to-end deep learning solution for driver distraction recognition is presented. In the proposed framework, the features from pre-trained convolutional neural networks VGG-19 are extracted. Despite the variation in illumination conditions, camera position, driver’s ethnicity, and genders in our dataset, our best fine-tuned model, VGG-19 has achieved the highest test accuracy...
متن کاملRobust end-to-end deep audiovisual speech recognition
Speech is one of the most effective ways of communication among humans. Even though audio is the most common way of transmitting speech, very important information can be found in other modalities, such as vision. Vision is particularly useful when the acoustic signal is corrupted. Multi-modal speech recognition however has not yet found wide-spread use, mostly because the temporal alignment an...
متن کاملEnd-to-End Deep Learning Framework for Speech Paralinguistics Detection Based on Perception Aware Spectrum
In this paper, we propose an end-to-end deep learning framework to detect speech paralinguistics using perception aware spectrum as input. Existing studies show that speech under cold has distinct variations of energy distribution on low frequency components compared with the speech under ‘healthy’ condition. This motivates us to use perception aware spectrum as the input to an end-to-end learn...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International journal of intelligent computing and information sciences
سال: 2021
ISSN: ['1687-109X', '2535-1710']
DOI: https://doi.org/10.21608/ijicis.2021.73581.1086